Pandas Exercise

Post Reply
LuizZ
Posts: 1
Joined: Tue May 12, 2020 12:28 am

Pandas Exercise

Post by LuizZ »

Hi, I am a beginner and I am not able to figure out how to do Pandas exercise ("Generate the Donut Dataset"). Maybe someone could help me out?

Here is what I did:

Code: Select all

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

X = np.random.randn(10000, 2)
y = np.zeros(10000)

df = pd.DataFrame(data=X,
                  index=np.arange(10000),
                  columns=[['x1', 'x2']])
Up to this part, it was ok, I manage to create a first data frame. Then, I failed to create new columns containing the square of each of the original columns, and also the multiplicative column (the quadratic feature expansion). I tried it two different ways:

First, I tried the same way the instructor shows us, with apply() function:

Code: Select all

                  
def x1_2(row):
  return row['x1']**2
df['x1_2'] = df.apply(x1_2, axis=0)
Then I got the error message: KeyError: 'x1'

Second, I searched into Pandas documentation, as instructed, then I tried:

Code: Select all

df['x1_2'] = df['x1']**2
df['x2_2'] = df['x2']**2
df['x1x2'] = df['x1']*df['x2']
Then I got the error message: TypeError: only integer scalar arrays can be converted to a scalar index.

Any ideas about how to perform the quadratic feature expansion (i.e. generate 3 new columns containing x1^2, x2^2 and x1*x2?
EFortier
Posts: 4
Joined: Sat Jun 06, 2020 2:21 pm

Re: Pandas Exercise

Post by EFortier »

Here's what I have so far:

Code: Select all

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

x1 = (np.random.random(2000)-0.5)*30
x2 = (np.random.random(2000)-0.5)*30

A = np.column_stack((x1,x2))
df = pd.DataFrame(A, columns=['x1','x2'])

def pwr2(e):
  return e*e
def mult(e,f):
  return e*f

df['x1^2'] = df['x1'].apply(pwr2)
df['x2^2'] = df['x2'].apply(pwr2)

df['x1*x2'] = df.apply(lambda x: mult(x['x1'], x['x2']),axis=1)
I generate the data using the uniform distribution, then convert half of those to negative values simply by subtracting 0.5, then expand the ranges over -15 to 15 simply because the dataset presented seemed to have data points along these ranges. Then I use the column_stack method from numpy to add up these ranges into columns. Then I generate the dataframe.

Now the ugly part: I define functions for taking the power of 2 and mutliplication as the other methods I've tried failed. Next I apply these methods on x1 and x2.

The only thing missing: No idea how these values can lead to concentric circles. My guess is data selection is flawed, but this seems more of a mathematical question rather than code. Any ideas?
Monster
Posts: 2
Joined: Sun Jun 07, 2020 9:36 pm

Re: Pandas Exercise

Post by Monster »

Hi guys!

Here is my two cents. I don't think I have approached the exercise in the intended manner (so please be careful not to let me guide you in the wrong direction), but I've tried to get as close as I could. The main problem I have is the generation of the random datapoint. Everything after that seems to work as intended. Here's my code. Anybody who figured out how to generate the data the proper way, please let me know!

Code: Select all

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

y = np.random.uniform(-10,10,1000)
df = pd.DataFrame(data = y, columns = ['y'])

def function_x1(row):
    y = row['y']
    x1 = np.sqrt(100 - y**2)
    return x1

df['x1'] = df.apply(function_x1, axis = 1)

def function_x2(row):
    y = row['y']
    x2 = -np.sqrt(100 - y**2)
    return x2

df['x2'] = df.apply(function_x2, axis = 1)

def squared_x1(row):
    x1 = row['x1']
    x1_squared = x1**2
    return x1_squared

df['x1^2'] = df.apply(squared_x1, axis = 1)

def squared_x2(row):
    x2 = row['x2']
    x2_squared = x2**2
    return x2_squared

df['x2^2'] = df.apply(squared_x2, axis = 1)

def x1_times_x2(row):
    x1 = row['x1']
    x2 = row['x2']
    x1_times_x2 = x1*x2
    return x1_times_x2

df['x1*x2'] = df.apply(x1_times_x2, axis = 1)

plt.scatter(df['x1'],df['y']);
plt.scatter(df['x2'],df['y']);

df.columns.values

df.columns = ['x1', 'x2', 'x1^2', 'x2^2', 'x1*x2', 'y']
df

df.to_csv('Pandas Exercise', index=False, header=False)
Have a great day!
duketan93
Posts: 1
Joined: Mon Jun 08, 2020 3:32 pm

Re: Pandas Exercise

Post by duketan93 »

Hi Guys! I am attempting this exercise and would like to share my findings. Not sure if I actually understand the exercise correctly. Nevertheless, this is the best that I can produce after long hours of trying today. I would like to hear from your sharings and comments.

https://github.com/duketan93/Pandas-Exercise.git
Last edited by duketan93 on Tue Jun 09, 2020 2:51 am, edited 1 time in total.
zg1seg
Posts: 2
Joined: Mon Jul 20, 2020 10:51 pm

Re: Pandas Exercise

Post by zg1seg »

Hi guys!

I think I did it. Hopefully, solution will help someone.
But does anybody know, is there any built-in way to generate donut distribution? Or at least to convert polar->cartesian->polar coordinates?

Code: Select all

import numpy as np
import pandas as pd
import matplotlib.pyplot as plot


def create_donut(radius, size=1000):
    # assume that arr is in polar coordinates
    arr = np.array([np.linspace(0, 2 * np.pi, size), np.random.randn(size)]).T + radius
    cartesian_arr = np.array([arr[:, 1] * np.cos(arr[:, 0]), arr[:, 1] * np.sin(arr[:, 0])]).T
    return cartesian_arr


outerCircle = create_donut(10)
innerCircle = create_donut(5)

dfo = pd.DataFrame(outerCircle, columns=["x1", "x2"])
dfo["y"] = 1
dfi = pd.DataFrame(innerCircle, columns=["x1", "x2"])
dfi["y"] = 0

# generate DataFrame for result csv
df_result = pd.concat([dfi, dfo], ignore_index=True)

df_result["x1^2"] = df_result["x1"] ** 2
df_result["x2^2"] = df_result["x2"] ** 2
df_result["x1*x2"] = df_result["x1"] * df_result["x2"]

# rearrange columns
df_result = df_result[["x1", "x2", "x1^2", "x2^2", "x1*x2", "y"]]
# shuffle to mix up "y" values
df_result = df_result.sample(frac=1.0)

df_result.to_csv("result.csv", header=False, index=False)

# plot
ax = dfo.plot(x=0, y=1, kind="scatter", color="gold")
dfi.plot(x=0, y=1, kind="scatter", color="indigo", ax=ax, figsize=(5, 5), legend=False)

plot.show()
Murloc Holmes
Posts: 1
Joined: Thu Mar 18, 2021 9:04 pm

Re: Pandas Exercise

Post by Murloc Holmes »

Very nice zg1seg. Thanks for posting. Enjoying this course although this problem was beyond me. Hopefully not a sign that future Lazy Programmer courses will go over my head. :lol:
BabaKirtos
Posts: 5
Joined: Sat Apr 03, 2021 1:00 pm

Re: Pandas Exercise

Post by BabaKirtos »

Hello, I was able to generate the donut scatter plot by converting polar array to cartesian array, but when I tried to generate a simple circle using the equation of circle i.e. x^2 + y^2 = r^2 (centered circle), I was not able to do that. Basically, when I use y = np.sqrt(r**2 - x**2), where x is a linspace array from -radius to +radius, I'm only getting the positive values for y, i.e. a semi circle in 1st and 2nd quadrant, how can I represent this equation in a 2D array, any help would be very much appreciated. Thank you.
lazyprogrammer
Site Admin
Posts: 49
Joined: Sat Jul 28, 2018 3:46 am

Re: Pandas Exercise

Post by lazyprogrammer »

BabaKirtos wrote: Mon Apr 05, 2021 5:09 pm Hello, I was able to generate the donut scatter plot by converting polar array to cartesian array, but when I tried to generate a simple circle using the equation of circle i.e. x^2 + y^2 = r^2 (centered circle), I was not able to do that. Basically, when I use y = np.sqrt(r**2 - x**2), where x is a linspace array from -radius to +radius, I'm only getting the positive values for y, i.e. a semi circle in 1st and 2nd quadrant, how can I represent this equation in a 2D array, any help would be very much appreciated. Thank you.
The "equation" should have a positive and negative square root. ;)
BabaKirtos
Posts: 5
Joined: Sat Apr 03, 2021 1:00 pm

Re: Pandas Exercise

Post by BabaKirtos »

I persevered, sensei! :D
Donut is a little thin near the x axis, but still delicious :lol:

Code: Select all

radius = 5
points = 200

# repeating sequence to store negative y values
x = np.repeat(np.linspace(-radius,radius,num=points),2)

y = np.zeros(400)

# generating y array with positive and negative sqrt values
for n in range(len(x)):
  if n % 2 == 0:
    y[n] = (radius**2 - x[n]**2)**0.5
  else:
    y[n] = -((radius**2 - x[n]**2)**0.5)

# Generating random numbers and adding to y
arr = (np.random.random(400) + y)

# plotting x and arr
plt.scatter(x,arr)
Post Reply

Return to “Deep Learning Prerequisites: The Numpy Stack in Python”